Goto

Collaborating Authors

 lyric generation


Song Form-aware Full-Song Text-to-Lyrics Generation with Multi-Level Granularity Syllable Count Control

Chae, Yunkee, Shin, Eunsik, Suntae, Hwang, Paik, Seungryeol, Lee, Kyogu

arXiv.org Artificial Intelligence

Lyrics generation presents unique challenges, particularly in achieving precise syllable control while adhering to song form structures such as verses and choruses. Conventional line-by-line approaches often lead to unnatural phrasing, underscoring the need for more granular syllable management. We propose a framework for lyrics generation that enables multi-level syllable control at the word, phrase, line, and paragraph levels, aware of song form. Our approach generates complete lyrics conditioned on input text and song form, ensuring alignment with specified syllable constraints. Generated lyrics samples are available at: https://tinyurl.com/lyrics9999


Agent-Driven Large Language Models for Mandarin Lyric Generation

Liu, Hong-Hsiang, Liu, Yi-Wen

arXiv.org Artificial Intelligence

Generative Large Language Models have shown impressive in-context learning abilities, performing well across various tasks with just a prompt. Previous melody-to-lyric research has been limited by scarce high-quality aligned data and unclear standard for creativeness. Most efforts focused on general themes or emotions, which are less valuable given current language model capabilities. In tonal contour languages like Mandarin, pitch contours are influenced by both melody and tone, leading to variations in lyric-melody fit. Our study, validated by the Mpop600 dataset, confirms that lyricists and melody writers consider this fit during their composition process. In this research, we developed a multi-agent system that decomposes the melody-to-lyric task into sub-tasks, with each agent controlling rhyme, syllable count, lyric-melody alignment, and consistency. Listening tests were conducted via a diffusion-based singing voice synthesizer to evaluate the quality of lyrics generated by different agent groups.


LyCon: Lyrics Reconstruction from the Bag-of-Words Using Large Language Models

Kim, Haven, Choi, Kahyun

arXiv.org Artificial Intelligence

This paper addresses the unique challenge of conducting research in lyric studies, where direct use of lyrics is often restricted due to copyright concerns. Unlike typical data, internet-sourced lyrics are frequently protected under copyright law, necessitating alternative approaches. Our study introduces a novel method for generating copyright-free lyrics from publicly available Bag-of-Words (BoW) datasets, which contain the vocabulary of lyrics but not the lyrics themselves. Utilizing metadata associated with BoW datasets and large language models, we successfully reconstructed lyrics. We have compiled and made available a dataset of reconstructed lyrics, LyCon, aligned with metadata from renowned sources including the Million Song Dataset, Deezer Mood Detection Dataset, and AllMusic Genre Dataset, available for public access. We believe that the integration of metadata such as mood annotations or genres enables a variety of academic experiments on lyrics, such as conditional lyric generation.


Encoder-Decoder Framework for Interactive Free Verses with Generation with Controllable High-Quality Rhyming

Pasini, Tommaso, López-Ávila, Alejo, Quteineh, Husam, Lampouras, Gerasimos, Du, Jinhua, Wang, Yubing, Li, Ze, Sun, Yusen

arXiv.org Artificial Intelligence

Composing poetry or lyrics involves several creative factors, but a challenging aspect of generation is the adherence to a more or less strict metric and rhyming pattern. To address this challenge specifically, previous work on the task has mainly focused on reverse language modeling, which brings the critical selection of each rhyming word to the forefront of each verse. On the other hand, reversing the word order requires that models be trained from scratch with this task-specific goal and cannot take advantage of transfer learning from a Pretrained Language Model (PLM). We propose a novel fine-tuning approach that prepends the rhyming word at the start of each lyric, which allows the critical rhyming decision to be made before the model commits to the content of the lyric (as during reverse language modeling), but maintains compatibility with the word order of regular PLMs as the lyric itself is still generated in left-to-right order. We conducted extensive experiments to compare this fine-tuning against the current state-of-the-art strategies for rhyming, finding that our approach generates more readable text and better rhyming capabilities. Furthermore, we furnish a high-quality dataset in English and 12 other languages, analyse the approach's feasibility in a multilingual context, provide extensive experimental results shedding light on good and bad practices for lyrics generation, and propose metrics to compare methods in the future.


Syllable-level lyrics generation from melody exploiting character-level language model

Zhang, Zhe, Lasocki, Karol, Yu, Yi, Takasu, Atsuhiro

arXiv.org Artificial Intelligence

The generation of lyrics tightly connected to accompanying melodies involves establishing a mapping between musical notes and syllables of lyrics. This process requires a deep understanding of music constraints and semantic patterns at syllable-level, word-level, and sentence-level semantic meanings. However, pre-trained language models specifically designed at the syllable level are publicly unavailable. To solve these challenging issues, we propose to exploit fine-tuning character-level language models for syllable-level lyrics generation from symbolic melody. In particular, our method endeavors to incorporate linguistic knowledge of the language model into the beam search process of a syllable-level Transformer generator network. Additionally, by exploring ChatGPT-based evaluation for generated lyrics, along with human subjective evaluation, we demonstrate that our approach enhances the coherence and correctness of the generated lyrics, eliminating the need to train expensive new language models.


Unsupervised Melody-to-Lyric Generation

Tian, Yufei, Narayan-Chen, Anjali, Oraby, Shereen, Cervone, Alessandra, Sigurdsson, Gunnar, Tao, Chenyang, Zhao, Wenbo, Chen, Yiwen, Chung, Tagyoung, Huang, Jing, Peng, Nanyun

arXiv.org Artificial Intelligence

Automatic melody-to-lyric generation is a task in which song lyrics are generated to go with a given melody. It is of significant practical interest and more challenging than unconstrained lyric generation as the music imposes additional constraints onto the lyrics. The training data is limited as most songs are copyrighted, resulting in models that underfit the complicated cross-modal relationship between melody and lyrics. In this work, we propose a method for generating high-quality lyrics without training on any aligned melody-lyric data. Specifically, we design a hierarchical lyric generation framework that first generates a song outline and second the complete lyrics. The framework enables disentanglement of training (based purely on text) from inference (melody-guided text generation) to circumvent the shortage of parallel data. We leverage the segmentation and rhythm alignment between melody and lyrics to compile the given melody into decoding constraints as guidance during inference. The two-step hierarchical design also enables content control via the lyric outline, a much-desired feature for democratizing collaborative song creation. Experimental results show that our model can generate high-quality lyrics that are more on-topic, singable, intelligible, and coherent than strong baselines, for example SongMASS, a SOTA model trained on a parallel dataset, with a 24% relative overall quality improvement based on human ratings.


SongRewriter: A Chinese Song Rewriting System with Controllable Content and Rhyme Scheme

Sun, Yusen, Li, Liangyou, Liu, Qun, Yeung, Dit-Yan

arXiv.org Artificial Intelligence

Although lyrics generation has achieved significant progress in recent years, it has limited practical applications because the generated lyrics cannot be performed without composing compatible melodies. In this work, we bridge this practical gap by proposing a song rewriting system which rewrites the lyrics of an existing song such that the generated lyrics are compatible with the rhythm of the existing melody and thus singable. In particular, we propose SongRewriter,a controllable Chinese lyrics generation and editing system which assists users without prior knowledge of melody composition. The system is trained by a randomized multi-level masking strategy which produces a unified model for generating entirely new lyrics or editing a few fragments. To improve the controllabiliy of the generation process, we further incorporate a keyword prompt to control the lexical choices of the content and propose novel decoding constraints and a vowel modeling task to enable flexible end and internal rhyme schemes. While prior rhyming metrics are mainly for rap lyrics, we propose three novel rhyming evaluation metrics for song lyrics. Both automatic and human evaluations show that the proposed model performs better than the state-of-the-art models in both contents and rhyming quality.


Controllable Ancient Chinese Lyrics Generation Based on Phrase Prototype Retrieving

Yi, Li

arXiv.org Artificial Intelligence

Generating lyrics and poems is one of the essential downstream tasks in the Natural Language Processing (NLP) field. Current methods have performed well in some lyrics generation scenarios but need further improvements in tasks requiring fine control. We propose a novel method for generating ancient Chinese lyrics (Song Ci), a type of ancient lyrics that involves precise control of song structure. The system is equipped with a phrase retriever and a phrase connector. Based on an input prompt, the phrase retriever picks phrases from a database to construct a phrase pool. The phrase connector then selects a series of phrases from the phrase pool that minimizes a multi-term loss function that considers rhyme, song structure, and fluency. Experimental results show that our method can generate high-quality ancient Chinese lyrics while performing well on topic and song structure control. We also expect our approach to be generalized to other lyrics-generating tasks.


Multi-Modal Experience Inspired AI Creation

Cao, Qian, Chen, Xu, Song, Ruihua, Jiang, Hao, Yang, Guang, Cao, Zhao

arXiv.org Artificial Intelligence

AI creation, such as poem or lyrics generation, has attracted increasing attention from both industry and academic communities, with many promising models proposed in the past few years. Existing methods usually estimate the outputs based on single and independent visual or textual information. However, in reality, humans usually make creations according to their experiences, which may involve different modalities and be sequentially correlated. To model such human capabilities, in this paper, we define and solve a novel AI creation problem based on human experiences. More specifically, we study how to generate texts based on sequential multi-modal information. Compared with the previous works, this task is much more difficult because the designed model has to well understand and adapt the semantics among different modalities and effectively convert them into the output in a sequential manner. To alleviate these difficulties, we firstly design a multi-channel sequence-to-sequence architecture equipped with a multi-modal attention network. For more effective optimization, we then propose a curriculum negative sampling strategy tailored for the sequential inputs. To benchmark this problem and demonstrate the effectiveness of our model, we manually labeled a new multi-modal experience dataset. With this dataset, we conduct extensive experiments by comparing our model with a series of representative baselines, where we can demonstrate significant improvements in our model based on both automatic and human-centered metrics. The code and data are available at: \url{https://github.com/Aman-4-Real/MMTG}.


LyricJam: A system for generating lyrics for live instrumental music

Vechtomova, Olga, Sahu, Gaurav, Kumar, Dhruv

arXiv.org Artificial Intelligence

We describe a real-time system that receives a live audio stream from a jam session and generates lyric lines that are congruent with the live music being played. Two novel approaches are proposed to align the learned latent spaces of audio and text representations that allow the system to generate novel lyric lines matching live instrumental music. One approach is based on adversarial alignment of latent representations of audio and lyrics, while the other approach learns to transfer the topology from the music latent space to the lyric latent space. A user study with music artists using the system showed that the system was useful not only in lyric composition, but also encouraged the artists to improvise and find new musical expressions. Another user study demonstrated that users preferred the lines generated using the proposed methods to the lines generated by a baseline model.